Temporal Data Mining of Scientific Data Provenance
نویسندگان
چکیده
Provenance of digital scientific data is an important piece of the metadata of a data object. It can however grow voluminous quickly because the granularity level of capture can be high. It can also be quite feature rich. We propose a representation of the provenance data based on logical time that reduces the feature space. Creating time and frequency domain representations of the provenance, we apply clustering, classification and association rule mining to the abstract representations to determine the usefulness of the temporal representation. We evaluate the temporal representation using an existing 10 GB database of provenance captured from a range of scientific workflows.
منابع مشابه
Big Data Provenance: State-Of-The-Art Analysis and Emerging Research Challenges
This paper focuses the attention on big data provenance issues, and provides a comprehensive survey on state-of-theart analysis and emerging research challenges in this scientific field. Big data provenance is actually one of the most relevant problem in big data research, as confirmed by the great deal of attention devoted to this topic by larger and larger database and data mining research co...
متن کاملFrom Scientific Workflow Patterns to 5-star Linked Open Data
Scientific Workflow management systems have been largely adopted by data-intensive science communities. Many efforts have been dedicated to the representation and exploitation of provenance to improve reproducibility in data-intensive sciences. However, few works address the mining of provenance graphs to annotate the produced data with domain-specific context for better interpretation and shar...
متن کاملProvenance for Data Mining
Data mining aims at extracting useful information from large datasets. Most data mining approaches reduce the input data to produce a smaller output summarizing the mining result. While the purpose of data mining (extracting information) necessitates this reduction in size, the loss of information it entails can be problematic. Specifically, the results of data mining may be more confusing than...
متن کاملTowards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering
Data streams flowing from the physical environment are as unpredictable as the environment itself. Radars go down, long haul networks drop packets, and readings are corrupted on the wire. Yet the data driven scientific models and data mining algorithms do not necessarily account for the inaccuracies when assimilating the data. Low overhead provenance collection partially solves this problem. We...
متن کاملProvenance, Tectonic Setting & Geochemical Maturity of The Early Miocene Pyawbwe Formation, Sakangyi –Thayet Area, Magway Region, Myanmar.
Abstract The best exposed Early Miocene (820 m. thick. ) shales and interbedded silty sandstones beds of the Pyawbwe Formation at Sakangyi- Thayat area,Magway Region are investigated geochemically by using Siemens SRS- X Ray 303 AS XRF Spectrometer. Major and some trace element concentrations have been determined to achieve their provenance, tectonic setting ,paleoweathering , paleoclimate and ...
متن کامل